[ETA Prediction with Graph Neural Networks in Google Maps]https://arxiv.org/abs/2108.11482
ETA Prediction
In my math/physics brain, this is a flow problem but although cars are never created nor destroyed, they do park quite often and resume driving again at what may as well be random times.
Instead more commonly, historical average speeds by segment as by hour of day and day of week (noting holidays hopefully) are used to train speed predictions as a function of most recent speeds (over various time offsets) across the graph.
For a more in-depth review of ETA modeling see this review suggested by the paper.
The traffic measurements at various nodes (streets, intersections) are not necessarily aligned time series, an interesting complication that many models cannot handle that this model can. Also:
There exist many patterns inherent to the road network topology that may be particularly pertinent to forecasting traffic flow. Similar motifs in the road network of the same region are likely to correspond to similar traffic dynamics—as such, automatically discovering and compressing them is likely to be useful.
–from Section 2. RELATED WORK
Weather Prediction
Note that Google came out with a GNN approach for weather prediction in 2022 (the next year).
Graph Neural Networks
Graph Neural Networks (GNNs) are a family of deep learning models that operate on graphs.
Graphs are a data structure that represent relationships or connections (nodes) between different entities (nodes).
GNNs can be used for classification and regression of nodes, edges or the entire graph; they are often used to train node or edge embeddings for downstream tasks, possibly independent of the original graph. Nodes and edges can have features and there can be graph-level features as well.
Social networks are perhaps among the most commonly studied examples of graphs, where nodes represent users, edges correspond to friendship relations between them: node features are user properties such as age, profile picture, etc; edge features can include the current status of the relationship, the duration, frequency of interactions, etc.
Unlike traditional deep learning models that process only flat structured data like images or text, GNNs can handle non-Euclidean data and capture both local and global information from the graph structure. The basic idea behind GNNs is to propagate information along the edges of the graph through multiple layers, updating each node’s representation based on its neighbors’ representations in the previous layer. This allows the model to learn complex patterns and relations between nodes.
The key structural property of graphs is that the nodes in V are usually not
assumed to be provided in any particular order, and thus any operations
performed on graphs should not depend on the ordering of nodes. The
desirable property that functions acting on graphs should satisfy is thus
permutation invariance, and it implies that for any two isomorphic graphs, the
outcomes of these functions are identical.
–from Geometric Deep Learning, section 4.1
A visualisation of the dataflow for the three flavours of GNN layers, g. We use the neighbourhood of node b to illustrate this. Left-to-right: convolutional, where sender node features are multiplied with a constant, cuv; attentional, where this multiplier is implicitly computed via an attention mechanism of the receiver over the sender: αuv = a(xu, xv); and message-passing, where vector-based messages are computed based on both the sender and receiver: muv = ψ(xu, xv).
In the convolutional flavours, the features of the neighbourhood nodes are directly
aggregated with fixed weights,
Here, cuv specifies the importance of node v to node u’s representation. It
is a constant that often directly depends on the entries in A representing
the structure of the graph.
In the attentional flavours, the interactions are implicit
Here, a is a learnable self-attention mechanism that computes the importance
coefficients αuv = a(xu, xv) implicitly.
The message-passing flavour amounts to computing arbitrary vectors (“messages”) across edges,
Here, ψ is a learnable message function, computing v’s vector sent to u, and the
aggregation can be considered as a form of message passing on the graph.
One important thing to note is a representational containment between these
approaches: convolution ⊆ attention ⊆ message-passing.
–from Geometric Deep Learning, section 5.3
The paper uses a “message-passing” style of GNN, specifically following the encoder-decoder framework described in Relational inductive bias for physical construction in humans and machines (2018). Essentially, they learn node and edge embeddings as well as a global graph embedding; all updates act in the embedding space.
The “mnist” of GNNs is Zachary’s karate club. In jraph (the jax GNN library) :
Here we train a graph neural network to process Zachary’s karate club.
Zachary’s karate club is used in the literature as an example of a social graph.
Here we use a graphnet to optimize the assignments of the students in the
karate club to two distinct karate instructors (Mr. Hi and John A).
Further Reading / References
A Gentle Introduction to Graph Neural Networks and follow-up Understanding Convolutions on Graphs
Graph Neural Networks: A Review of Methods and Applications
Training
MetaGradients
Introduced in RL.Essentially, slowly learn your hyperparameters while learning your model. We should read the RL paper if we want more details.
Exponential Moving Average
The hardest aspect of training was the high variance of model performance throughout training, something common to RL. To battle this they applied an exponential moving average to all model parameters for later evaluation and serving.
Segments, Super-Segments & Extended Super-Segments
Segments ~ 50-100m, super-segments are ~ 20 segments and modeling with extended super-segments allowed them to adjust ETA estimation given traffic further away that might still affect your route.